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Abstract 

With the imminent completion of the Human Genome Project, biomedical research is being revolutionised by the ability to carrv n ,„ 
.nvest.gat.ons on a genome w.dc sea.. This is particu.ar.y important in cancer, a disease that is caused f^m^^S^^ 
the sequence and expression of a number of cnt.cal genes. Gene expression microarray technology is gainlg increasing^ ^deTpreaJ " 
as a means to determ.ne the express.on of potentially all human genes at the level of messenger RNA In this comlln,^ 
developments in gene expression microarray technology and illustrate the progress and ^^^^^^ 
pharmacology, and drug development. Important applications include: (a) development of a more E lobal wZSLS^Z ,k 
express.on abnormalities that contribute to malignant progression; (b) discovery of new diagnostic Z££££Z^j£*Z 
of therapeutic response; (c) .dent.ficat.on and validation of new molecular targets for drug development- fd) orovSon „Tl ^ 
understanding of the molecular mode of action during lead identification and iU^inSSS^ 

on-*rget versus off-target effects; (e) pred.ction of potential side-effects during preclinical development and 2.0^ 2 ' £ 
confirmation of a mo ecular mode of act on durine hvnoth^i« i**u no / * -a -n ■ K wwuwjiogy studies, (i) 

sensitivity and resistance; and (h) prediction o ^T^^Ztnl from t £ ^.T^ 
As a result of further technological improvements J£Z^£^^ 0 ?^^ genera, pharmacogenomic studies, 
routine too. for cancer and biomedical research. O 200. ^^S^^^Z^ ^ " ^ ** 
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1. Introduction and rationale 

l.L Advent of post-genomic biology 

In February of this year, we witnessed arguably one of 
the most monumental achievements in biology — the simul- 
taneous publication of around 93% of the sequence of the 
human genome by a public sector consortium and a private 
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company [1,2]'. The sequence has revealed many surprises, 
not least that the 10 billion base pairs likely encode around 
30,000-40,000 genes— rather fewer than some of the earlier 
predictions of 1 00,000 or more [3] and barely twice as many as 
the fruit fly. On the other hand, the structure of human genes is 
more complex, incorporating multiple vertebrate-specific func- 
tional domains into sophisticated protein products, with 
further diversity being provided by alternative splicing. One 
of the aims behind the strategy of genomic sequencing is to 
provide an inventory of all the genes and regulatory se- 
quences required to build an organism. The sequencing 
continues apace in both the public and private sectors, and 
complete sequences of mouse, rat, and other genomes will 
follow shortly (see http://www.sanger.ac.uk/or http://www. 
ncbi.nlm.nih.gov/Genomes/index.html). However, the dep- 



'Due to the large number of contributors to Ref.[2] (more than 250 
authors), the reader is referred to the original article for a complete listing 
of the authors' names. 
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osition of billions of bases of sequence into the databases 
cannot be considered the final goal as we are unable to 
define biological function — or disease pathology — purely 
from a genomic DNA sequence. The repositories of genes 
and their regulatory sequences represent the starting point of 
the new challenge of post-sequence functional genomics, 
which is to understand how these components interact and 
function [4]. 

Importantly, in terms of practical application, annotation 
of the human genome is certain to produce benefits not only 
for understanding basic biology, but also for identifying the 
molecular basis of disease and for accelerating the rate of 
drug discovery and development. Alongside high through- 
put sequencing to determine normal and abnormal gene 
structure, the use of microarray technology to measure gene 
expression patterns on a global scale is now poised to 
revolutionise the discovery and use of new medical treat- 
ments. The application of gene expression microarrays in 
cancer biology, pharmacology, and drug development is the 
subject of this review. 

1.2. Gene expression profiling in cancer biology and 
treatment 

The use of gene expression microarrays is particularly 
important in cancer. This is because the accumulation and 
combinatorial effects of abnormalities that drive the initia- 
tion and malignant progression of cancer result from the 
altered sequence or expression level of cancer-causing 
genes. These genetic abnormalities, which may be inherited 
or acquired, lead to the 'big six' hallmark traits of cancer, 
namely: (a) self-sufficiency in proliferative growth signals; 
(b) insensitivity to growth inhibitory signals; (c) evasion of 
apoptosis; (d) acquisition of limitless replicative potential; 
(e) induction of angiogenesis; and (f) induction of invasion 
and metastasis [5,6]. 

As will be discussed in subsequent sections, the use of 
microarrays can be extremely valuable both in understand- 
ing the basic biology and in the treatment of cancer. Impor- 
tant applications include: (a) development of a more global 
understanding of the gene expression changes that contrib- 
ute to malignant progression; (b) discovery of diagnostic 
and prognostic indicators and biomarkers of response; (c) 
identification and validation of new molecular targets; (d) 
provision of an improved understanding of the molecular 
mode of action during lead identification and optimisation; 
(e) prediction of potential side-effects during preclinical 
development and toxicology studies; (f) confirmation of the 
molecular mode of action during hypothesis-testing early 
clinical trials; (g) identification of genes involved in con- 
ferring drug sensitivity and resistance; and (h) prediction of 
patients most likely to benefit from the drug and use in 
general pharmacogenomic studies (Fig. 1). 

The use of gene expression microarrays in the above- 
mentioned ways can be carried out regardless of the nature 
of the molecular target. It should be noted, however, that as 



a result of our improved understanding of the genetics and 
molecular biology of cancer, there is an increasing move 
away from relatively non-selective cytotoxic drugs towards 
the new generation of molecular therapeutic agents that 
target the key molecular abnormalities that drive malignant 
progression and which, as a result, have an impact on one or 
more of the six hallmark traits of cancer listed above [7,8]. 
Proof of principle is now emerging that these selective 
agents show biological and therapeutic activity by the de- 
sired mechanism, not only in preclinical models but also in 
the cancer patient; leading examples are the monoclonal 
antibody Herceptin (trastuzumab) in erbB2-positive breast 
cancer, the bcr-abl tyrosine kinase inhibitor Glivec 
(ST1571) in Philadelphia chromosome-positive chronic my- 
elogenous leukemia and acute lymphocytic leukemia, and 
the epidermal growth factor receptor tyrosine kinase inhib- 
itor Iressa (ZD 1 839) in non-small cell lung cancer [7,8]. 
Microarray technology will be especially useful for the 
discovery, development, and clinical use of such genome- 
based molecular therapeutics. 

1.3. Why measure global gene expression? 

The gene expression profile of a cell determines its 
phenotype, function, and response to the environment. The 
complement of genes expressed by a cell is very dynamic 
and will respond rapidly to external stimuli. Therefore, 
measurement of gene expression can potentially provide 
clues about regulatory mechanisms, biochemical pathways, 
and broader cellular function. In addition, the determination 
of genes expressed in disease tissue compared with the 
normal counterpart will further the understanding of disease 
pathology and identify potential points for therapeutic in- 
tervention. The underlying genetic basis of cancer makes 
this especially valid as tumor behavior is likely to be dic- 
tated in a combinatorial fashion by the mutation and abnor- 
mal expression of hundreds of genes. Thus, analysis of 
global gene expression patterns has the potential to predict 
biological behavior and clinical consequences, an expecta- 
tion that will revolutionise cancer diagnosis and treatment. 

In traditional biological and pharmacological experi- 
ments, it is usual to measure the expression of only a single 
gene or, at most, a handful of genes. A simple example of 
the potential benefit of measuring gene expression in cancer 
pharmacology is illustrated by the study of Wosikowski et 
al. [9], who measured the expression of epidermal growth 
factor receptor, transforming growth factor-a, and c-erb-B2 
in the 60 human tumor cell line panel of the US National 
Cancer Institute. A database of 49,000 compounds was then 
searched for agents for which cytotoxicity correlated with 
high level expression of the receptor and ligand mRNAs, 
and a number of correlations were noted. These were indic- 
ative of compounds that were potentially acting as inhibitors 
of the above mentioned tyrosine kinase receptors and re- 
lated pathways. Though successful, there is a considerable 
limitation to the approach. Given that there are 30,000- 
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Fig. 1. Various phases in the discovery and development of therapeutic agents and of diagnostic, prognostic, and other biomarkers. Microarray expression 
profiling can be used to help advance all stages of this process. 



40,000 human genes, screening the entire inventory of 
genes with techniques that measure only a few genes at one 
time is impractical. A more efficient and complete strategy 
would be to measure simultaneously all the genes, or a large 
defined subset, encoded by the human genome. In addition 
to obtaining a global view of gene expression, this strategy 
could also identify genes previously unassociated with par- 
ticular cellular processes and would also remove the intel- 
lectual bias that is inevitably generated when examining a 
single gene or small numbers of genes. A subsequent study 
attempted to address this challenge by correlating the ex- 
pression across the 60 cell line panel of 140 genes, as 
measured at the protein level by two-dimensional gels, with 
the cytotoxicity of approximately 4000 different compounds 
[10]. A number of drug sensitivity/expression correlations 
were noted, involving genes encoding products such as 
MDR-1, heat shock proteins, and p53, and these suggested 
a number of testable hypotheses. More importantly, this 
study demonstrated the utility that an expression profiling 
approach could potentially have in the development of can- 
cer therapeutics. Ideally then, we would like to be able to 
profile total gene expression patterns in large numbers of 
experimental samples. 



1.4. Why measure mRNA? 

Gene expression can be assessed by measuring the quan- 
tity of the final product, i.e. the protein, or its intermediate, 
the mRNA template. At face value it would appear to be 
more rational to measure either the expression or activity of 
the final product, as this is the unit of function within the 
cell. Consistent with this, the few studies that have directly 
compared the relationship between mRNA and protein lev- 
els have found a poor correlation between the expression of 
selected proteins and their mRNAs. For example, the levels 
of the protein products of genes with similar abundances of 
mRNAs could vary up to 20-fold in Saccharomyces cerevi- 
siae; likewise there was a 30-fold variance in the levels of 
mRNAs encoding proteins that were expressed with com- 
parable abundance [11,12]. These observations, coupled 
with the fact that the final functional product of gene ex- 
pression is measured rather than the intermediate mRNA 
species, imply that protein-based rather than transcript- 
based methods would be preferable. Analysis of protein also 
allows assessment of the influence of post-translational 
modifications such as phosphorylation, glycosylation, and 
proteolytic processing. However, the efficiency with which 
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Tabic I 

Useful genome and microarray websites 



Website 



Location or other detail 



http://www.sanger.ac.uk/ 

http://genome.ucsc.edu/ 

http://www.ensembl.org/ 

http://www.ncbi.nlm.nih.gov/UniGene/ 

http://www.tigr.org/tdb/hgi/index 

http://www.ebi.ac.uk/ 

http://www.discover.nci.nih.gov/textmining/filters 

http://www.ncbi.nlm.nih.gov/Omim/ 

http ://ww w .genome .ad j p/k egg/ 

http://www.hmorf.com 

http://www.icr.ac.uk/array/array 

http://www.nhgri.nih.gov/DlR/Microarray/main 

http://cmgm.stanford.edu/pbrown/ 

http://genome-w ww4.stanford.edu/M icroArray/SM D/ 

http://cmgm.stanford.edu/cgibin/cgiwrap/uebshin/dcforum/dcboard.cgi 

http :// rana. 1 bl . gov/ 

http://www.ebi.ac.uk/microarray/ 

http://www.ncbi.nlm.nih.gov/geo/ 

http://www.discover.nci.nih.gov/ 

http://www.microarrays.org/ 

http://www.umich.eduJ— caparray/ 

http://w95vcl.neuro.chop.edu/vcheung/ 

http://sequence.aecom.yu.edu/bioinf/funcgenomic 

http :// w ww . mged .org 

http://www.afrymetrix.com/ 

http://www.resgen.com/ 

http://www.ambion.com/ 

http://www.clontech.com/ 

http://www.rii.com/ 

http://www.axon.com/ 

http://www.apbiotech.com/ 

http://www.chem.agilent.com/ 



Sanger Centre 

Human genome annotation 

Human genome annotation 

UniGene gene clustering 

The Institute for Genomic Research 

European Bioinformatic Institute 

Medminer 

Online Mendelian Inheritance in Man 

KEGG: Kyoto Encyclopedia of Genes and Genomes 

Gene expression and annotation 

Institute of Cancer Research, UK 

Cancer Genetics Branch, National Human Genome Research Institute 
Brown lab 

Stanford microarray database 
Microarray discussion forum 
Eisen lab 

European Bioinformatic Institute - microarray site 
Gene expression omnibus 

Genomics and bioinformatics group, National Cancer Institute, USA 
General information 

University of Michigan-Comprehensive Cancer Center 

The Genomics group at Children's Hospital, Philadelphia 

Functional Genomics at the Department of Molecular Genetics, Albert Einstein 

College of Medicine, New York 

Microarray gene expression database group 

Affymetrix 

Research Genetics 

Ambion 

Clontech 

Rosetta Inpharmatics Inc. 
Axon Instruments Inc. 
Amersham Pharmacia 
Agilent Technologies 



See Ref. 95 for additional information. 



genome project has not yet reached this stage, although the 
goal may not be far away. Therefore, construction of a 
human gene-specific array requires a choice to be made 
from a range of gene sequences (probes) selected from 
public databases, such as GenBank, dbEST, UniGene, or 
proprietary databases [25-27]. The UniGene database 
(http://www.ncbi.nlm.nih.gov/UniGene/) is an excellent 
starting point for identifying and choosing the DNA se- 
quences to be arrayed. This database is an experimental 
system for automatically partitioning clone sequences into 
non-redundant sets of gene-oriented clusters that represent 
unique genes. It is a very useful gene discovery resource, as 
in addition to sequences of well-characterised genes, hun- 
dreds of thousands of novel ESTs corresponding to genes of 
unknown function are included. cDNAs from local sources 
can also be included or, for gene discovery projects, cDNA 
libraries of uncharacterised clones can be used. A good 
alternative starting point for assembling a list of clones for 
arraying is the I5K set (http://www.nhgri.nih.gov/DlR/ 
LCG/15K/HTML/) that comprises approximately 15,000 
UniGene clusters. 

Several factors govern the choice of clones to be arrayed. 



The presence within the clone of repeat sequences, such as 
Alu or LINE repeats, can influence the hybridisation signal 
and should be avoided. Another factor to be considered is 
that some ESTs exhibit weak UniGene clustering and, as the 
procedures for automated sequence clustering are still under 
development, the results may change from time to time as 
improvements are made. In some instances, clones will be 
reclustered from one gene cluster to another and in essence 
change identity. This effect will become less of an issue as 
all the genes encoded by the human genome are definitively 
identified (estimated completion 2004). In the interim, it is 
necessary to update putative gene identities on a regular 
basis. Alternatively, mapping clone sequences directly onto 
the human genome sequence using the Ensembl package at 
European Bioinformatic Institute (http://ensembl.ebi.ac.uk/) 
or searching the Institute for Genomic Research human 
gene index (http://www.tigr.org/tdb/hgi/), which assembles 
ESTs into tentative human consensus sequences, can some- 
times identify conflicts or clustering artifacts. There has also 
been some evidence of discrepancy between the actual and 
designated sequences of some clones and also cases of 
clones being mixed. Mixed clones can be detected and 



1316 



Clarke e, ai /Biochemical Pharmacology 62 (2001) ,3U-,3 36 

NYLON MEMBRANE ARRAYS 
Test RNA Reference RNA 



DNA c 
(eg. cONAs or 



GLASS SLIDE ARRAYS 
Test RNA Refenmce RNA 




J Detect bound probe by I 
▼ PrtosprtonmeQinfl y 



Normafrse and express 
ratio of Uwr referonos 





^. Measure Cy3 and Cy5 



Cy3 




fHevdocoJoured 




Nofm «k»e«nd express 

™t*o of Cy5:Cy3 



P»eudocotoured 



Fig. 2. Schematic of the various steps in a m' 

«o nylon arrays is detected by phosphorima B S Hvh 7 * ,eS ' Snd reference respect veW f5T" ? ' g,aSS Slide ^ 

fluorescent emission co..ected Va c^ZZT^f? T" b - ~ e *» b > i^IftST ^ ^i** 0 " 



avoided by .nclud.ng a quality control step using gel elec 
trophorests to exclude PGR reactions that yield mil" 
products. Most clones available for arraying are t 
quence validated, bu, some discrepant £™ 
ported w.th sequence-verified mouse clones f281 TW 
potential problems reinforce the need to va" date fm oln Z 
rn-croarray observations using complementary meZd T 
Another important dec s on is the choirs r>r . 

a solid support such as nitrocellulose char^H „ 7 
glass. ^ kinetjcs of dup , ex ,^^3^ 
by dtfTus.on of solvent or solutes into and out ofTores and 
by local effects within a pore. In this respect glass h^so™ 
pract.cal advantages, as it is non-porous'andt ^ ngTd.W 

1? S ^ dd ' ng reading easier - *e future hei 
factors may allow miniaturisation and automation by W 
porafon ,„ to flow cells [29]. As mentioned a£ g^s" 
sl-des allow test and control samples to be compared 7 
recUy o„ the same array, whereas nylon membr^r^ re " 
parallel processing of samples on separate arrays. oTZ 
other hand, arrays on glass slides are restricted io a s*l 
use whereas nylon arrays are generally probed whh Zt 
abeled cDNAs and, with care, can be used a nl^C'f 
t.mes, a cons.deration that may be important Smaller 
ms nufons or laboratories where access to core faciSis 
restneted or when resources are limited rac " ,,,es 
Oligonucleotides synthesised ,* situ use synthetic linkers 



eriddino «,« k Z P° J y-L-Iysine or ammosilane. The 

solved .„* „ 5 0 £ dmsS ""T^^""^ b »T«' » 

2.3. Sample preparation and labeling 

eellutrRMr '"?''!'' Wi ' h ' abelsd re P"""»'i»» of ,1* 



cultures that appeared «o have no phenotypic differences 
These fluctuations represent biological noL In parallel 
experiments with yeast mutant strains that lacked a grow, h 

2 fold!" flUC,Uati ° nS aCC ° Umed for virtua »y ^11 of he > 
2-fold changes in gene expression that were detected ThuT 

effects of drug treatment regimens. These include vehicle 
controls and, ideally, the use of inactive drug analogue 
Approbate control vectors and transfection controls shouTd 

samnl tnHl 8 8en , e ,ranSfCT eXperiments - ad ^io n d 
sample handling must also be standardised to avoid *nl , ' 

resulting from prolonged manip U , a tion Th effec* 

express o„ of infective agents, such as mycoplasma Zf a„ 

frequently contaminate cell cultures are „„iL J 

tf/. [31] noted that aneupJoidv in c rt Jl . 

gene dosage effects from an altered gene cornM™^™* 

of estabUshing worWwide databaL f„, cotS'ofgeLe 

p=e\rrssr~£H ! 

a numb,, of potentw p i, f „ ls . Biops^ZS k l£ 
limited in availability is senerMlv -„ i. aIeml l! ° ne " 
depending on the ^^^TSKS 

populations. In the case of tumor biopsies thvJZl t 
underlying gene expression patterns £om noZll t « 
mcludmg components such as surrounding no™ al Sue 
stroma, immune elements, and vasculature P 3]To We ve r 
this c an be teken imo accoum comparison of gen7e X 
pression patterns with those in relevant cell lines St 
transcriptome profiles characteristic of m 
or stroma [33,34]. An alterative is to dissec tout 1 
cells, for example by using laser cant..,-* m ; \ 
although this can beitio^o^f^^^ 
add.fonal factor to be considered is tL potential for t'' 
hand,i ng effects. A number of genes, in' d ' c^and 
junB, have been reported to be induced by prolong ha „ 
dl.ng of the sample [33]. Thus, the time between thfL 
being taken and snap-freezing in liquid mt o di ,™ 
tion in lysis buffer may also be critical P ' 
The labeled representations of cellular mRNA are „en 
erated using reverse transcriptase, an en Z yme thafgela es" 
a single-strand DNA copy of each RN A Genera I ly ^« 3 P1 
dNTPs are used for radio.abeling protocols, and tor seen 



P A. Clarke e, al. / Biochemical Pharmacology 62 (2001, 1 311-1336 



1317 



Cy dNTPs, namely Cy3-dNTPs and PvS hwtd 
together in the case of LoJcent meth^H . ' "* 
common labeling ---.TSSS^^, 1 '" 
b.otin derivatives or ligation to an RNAnTi , P rale "' 
biotin that can subsequently Z ^7^°^ 
streptavidin. Both total ce.luhr RNA JSJS 
can be used for array experiment. rJL W ' mRNA 
requires an 0^0^^ ^^tZT 

•hatare specific jf*. ^ 

tf- s,ide ar^ys pZSil^^ 
as these can be very suscentihu u , y Iat>e,ed C &NA, 
■nduced by ™^t^^^™T 
protem contaminants that can co-purify ■wwTwJa . ? 
dition, the methodology of RNA J»SL- ,n ad * 

the proportion of different RNA ex,ract,on ca " "-fluence 

that may .fftct^^^Sl^^^ ,,,l ^ 3 *** 
ent methodologies r391 A^oZ . ° mpanng differ - 
labeling technfqueJ ^ ^ l^**™ 

These requirements potential , make m^T * 

ible with the very limited! RN^ft ™ c ™ na !' s "compat- 

r ofbiops,Xm^^C^ 

reproducib e, but suffers fmm tko u ein cient and 

ase promoter sequence R71 Th* ^nxi a • polymer- 
- -nscriptio'n £ ^^ZSSS^ 



2.-/. Hybridisation and image acquisition 



1318 



P. A. Clarke et at. / Biochemical Pharmacology 62 (2001) 1311-1336 



semble into duplex in a process that is reversible but occurs 
with absolute fidelity [4], In effect, each labeled cDNA 
searches out and pairs with its complementary sequence on 
the array. The rules of recognition and the major factors that 
govern them, i.e. base composition, temperature, concentra- 
tion of monovalent and divalent ions, and sequence com- 
plexity, are reasonably well understood. The signal obtained 
following hybridisation to the arrays gives both a measure 
of the number of molecules bound and also their identity. 
The labeled cDNA is generally purified to remove unincor- 
porated nucleotides, salt, detergents, PGR primers, proteins, 
and RNA template prior to hybridisation to the array. Pre- 
hybridisation, hybridisation, and stringency wash conditions 
are similar to those frequently used for Southern or northern 
blotting. Blocking reagents are used to prevent non-specific 
interactions. Those commonly used include Denhardt's so- 
lution, SDS, sonicated salmon sperm DNA, tRNA, COT, 
DNA, and poly(A) oligonucleotides. The potential for back- 
ground fluorescence when using glass slide arrays necessi- 
tates the extra precautions of filtering all solutions and 
taking care that hybridisation buffer components, such as 
SDS or urea, do not precipitate out while setting up the 
hybridisation or washing the arrays. Hybridisation of radio- 
labeled cDNAs can be detected by autoradiography and 
densitometry or more preferably by phosphorimaging using 
commercially available phosphorimagers. The hybridisation 
of fluorescently labeled cDNAs is detected using laser scan- 
ners that excite the fluorescent dyes and collect their emis- 
sion at the relevant wavelengths. 

2.5. Image and data analysis 

A number of steps are necessary to obtain gene expres- 
sion data following acquisition of an array image. The first 
is to correctly identify the spots on the array. Most com- 
mercial readers or arrayers provide software for this step. In 
addition, several public sites provide unsupported software 
also developed for this purpose (e.g. http://www.nhgri.nih. 
gov/DIR/LCG/15K/HTML/). Spot identification requires 
overlaying and aligning of a grid specifying spot location 
onto the array image, a process that generally necessitates 
considerable operator input. Having located the spots, the 
background signal has to be calculated and subtracted from 
the hybridisation signal. This is done using algorithms that 
predict the expected position, size, and shape of the spot and 
then calculate local background in the vicinity of each spot. 
The solid format of glass has some particular advantages 
over nylon membranes in this respect, as nylon membranes 
can exhibit some degree of creasing or deformation that can 
confound grid alignment. The nature of radioactive decay 
also makes identification of the border of the spot and 
calculation of background difficult; this is because within 
the spot there is smooth graduation from high signal to 
background signal, as compared with the sharp defined spot 
resulting from a fluorescent signal. An additional problem 
with radiolabeled probes is the effect of * blossoming', 



where a high hybridisation signal encroaches on a neigh- 
boring spot and influences the signal detected from that 
spot. This effect can be reduced by using 33 P-labeled probes 
rather than 32 P.. However, one advantage of nylon mem- 
branes is that they are less prone to background effects 
caused by contaminants within the probe, precipitation of 
components of the hybridisation buffer, and also dust or 
irregularities on the slide surface that can effect fluorescent 
signals. These points also emphasise the fact that for all 
array types there is a need for operator interaction with the 
data, in effect to curate the data by flagging up bad spots. 
This curation process is currently one of the rate-limiting 
steps in the acquisition of array data. 

Comparison of data from multiple arrays or multiple 
samples on a single array requires the data to be normalised. 
This can be achieved using two different approaches. One 
strategy is to normalise to a set of genes that do not vary 
under the experimental conditions of choice (e.g. the 90 control 
genes described by DeRisi et al [40]). Alternatively, one can 
make the assumption that, under the conditions studied, the 
expression of the majority of genes will remain unchanged, 
allowing normalisation by comparison with global gene ex- 
pression. This approach works well for closely related samples, 
but obviously yields poorer normalisation as the samples 
diverge and the differences in gene expression increase. 
Another option is to 'spike' each RNA sample with an equal 
amount of an internal standard RNA. For arrays using both 
fluorescent and radiolabeled probes there are commercially 
and publicly available software packages that can overlay 
array images, calculate background, normalise the data, and 
produce an output of absolute expression or ratio of expres- 
sion comparing test sample to the reference sample. 

Having obtained absolute values or ratios of gene ex- 
pression, it is necessary to establish a database that allows 
the management and comparison of the information ob- 
tained from multiple experiments. The aim is to take the raw 
hybridisation data and simplify this information down to a 
table of gene/clone identity, expression value, or ratio of 
test:reference, and then to integrate this information with 
databases that contain genomic data, functional information, 
or literature references [41]. Database design is important as 
large quantities of information have to be managed before 
(data pertinent to clone location, identity, and array fabri- 
cation conditions) and after (data on experimenter, experi- 
mental aims and conditions, sample description, labeling 
conditions, raw hybridisation images, intensities, ratios, and 
background) carrying out the experiment [42]. It is becom- 
ing increasingly clear that biological function generally re- 
sults from complex interactions between many components; 
for example, transcriptional regulation is more sophisticated 
than the traditional view of gene expression being a simple 
on-off event. In addition, biological systems generally in- 
corporate features such as feedback, feed-forward, error 
checking, and redundancy [43,44]. Therefore, the acquisi- 
tion of sufficiently large datasets is essential to address 
complex biological systems or genome-wide function. 
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Once a suitable dataset has been established, the process 
of mining the data for meaningful information begins ,41- 
43]. The easiest form of analysis is simply to list all of the 
genes that differ in expression level, for example between 
the test sample and the control. Although useful, this ap- 
proach does not uncover the large amounts of complex 
.nformation contained within the data. More sophisticated 
analyse mvolves the identification of non-random groups of 
genes associated with a particular biological situation, so 
that these can be examined by further approaches. Multidi- 
mensional scaling in two or three dimensions of Euclidean 
distances allows the assessment of the approximate degree 
of similarity ,n gene expression between samples. Another 
common approach is to look at multiple experiments and to 
arrange or cluster the expression data into small homoge- 
neous groups. This can be done manually with small daL 
sets where identification of the extremes between two sam- 
ples or over a short time course is relatively straightforward 
However, this approach fails to extract all the potential 
information in genome-scale experiments with hundreds or 
thousands of samples. Therefore, mathematical approaches 



that essentially organise the data by grouping genes with 
s.rmlar express.on patterns have been developed 

One strategy is to use hierarchical clustering, an ap- 
proach commonly used in sequence/phylogeny experi- 
ments coupled with outputs that facilitate visual examina- 
fon of the data (F.g. 3) [45]. Several studies, especially fn 
yeast where greater than 50% of the ORFs have an ascribed 
funct,on, have noted co-regulation of the expression of 
genes encoding proteins from a common biochemical path- 
way, that are functionally related, or that form multi-protein 
complexes. This has led to the basic proposal mat gen " 
w.th sim.lar express.on profiles and behavior are likely to be 
co regulated and functionally related [4]. Thus, i, is possible 
to cluster the data from a given set of conditions or cell 
types and use a 'guilt by association' strategy to identify 
funconal clusters [44]. In th is way , the rf J f * 

known genes can be predicted, and tentative function can 
then be tested rigorously by conventional biochemical ap- 
proaches. In addition, co-regulation may also allow the 
ident.ficat.on of common regulatory elements within the 
promoter sequences of genes that cluster together. Although 
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3.1. Basic biology of cancer 
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involved in cell cycle regulation, budding, cell polarity, and 
centrosome organisation. 

In addition, the distribution of regulatory sequences up- 
stream of the gene promoter was analysed in both of the 
yeast studies described above [50,51]. A number of gene 
clusters had selective upstream motifs; for example, 55 of 
the 101 genes that clustered for polarity and budding had an 
early cell cycle box sequence motif, which occurred with an 
incidence of less than 4% in the other gene clusters. Addi- 
tional analysis also revealed the potential presence of two 
novel regulatory sequence motifs. Half of the gene clusters 
were enriched for functional categories or for a specific up- 
stream motif. Clusters with functional relationships were sta- 
tistically tighter than those that contained genes that appeared 
to be biologically unrelated, and those with related regulatory 
motifs were also more tightly clustered. One important obser- 
vation was the identification of previously uncharacterised 
ORFs that exhibited periodic fluctuation, and these were as- 
signed to clusters containing known genes with related func- 
tion. This work suggested that the function of unknown genes 
could be predicted by examining their clustering pattern, an 
observation that will be discussed later in this review. 

Transcription factors are frequently targets of many sig- 
naling pathways, and a number of studies have profiled gene 
expression following responses mediated by external sig- 
nals. The expression of 8600 distinct human genes has been 
profiled by microarray analysis at regular intervals follow- 
ing the stimulation of normal human diploid fibroblasts by 
serum [51]. The response to serum was rapid and wide- 
spread. In all, the expression of 517 genes was altered by 
more than 2.2-fold, with genes such as c-fos, junB, and 
MAPK phosphatase being induced within 1 5 min of expo- 
sure to serum. The timing of gene expression was coincident 
with progression through the cell cycle, and distinct clusters 
of genes involved in cell division were identified. A less 
expected finding was the observation that genes involved in 
wound healing were also induced, although in hindsight this 
may not be that surprising given the role of fibroblasts and 
serum in that process. The results suggested that fibroblasts, 
which are not normally in contact with serum, are pn> 
grammed to respond abruptly to serum and to orchestrate 
the healing response by promoting chemotaxis and differ- 
entiation of the various cell types involved in the immune 
response and angiogenesis. 

In another study of signal transduction, the induction of 
immediate early genes by receptor tyrosine kinase-activated 
signaling pathways was examined using receptors which 
lacked key binding sites for the intracellular adapter mole- 
cules that are required for the regulation of different signal- 
ing cascades [52]. This study used NIH3T3 cells transfected 
with a gene encoding a fusion protein, consisting of the 
wild-type or mutated cytoplasmic portion of ]3-platelet-de- 
rived growth factor receptor fused to the extracellular por- 
tion of the macrophage colony-stimulating receptor. High- 
density oligonucleotide arrays were used to measure the 
expression of 5938 genes following exposure of NIH3T3 



cells expressing the fused receptor to macrophage colony- 
stimulating factor. Sixty-six genes were induced by more 
than 3-fold following an exposure for a few hours to mac- 
rophage colony-stimulating factor, of which 50% had been 
identified previously in the literature as immediate early 
genes induced by receptor tyrosine kinase activation. The 
mutant form of the 0-platelet-derived growth factor receptor 
that lacks binding sites for phospholipase C-yl, phospho- 
inositide-3-kinase, shp2, and rasgap could still induce these 
genes, but to a slightly lower level than the wild-type re- 
ceptor. Removal of an additional site that was bound by 
grb2 resulted in a greatly reduced induction, but did not 
eliminate completely the induction of immediate early 
genes. These observations suggested that the distinct path- 
ways emanating from the 0-platelet-derived growth factor 
receptor exhibited a certain degree of functional redundancy 
for the induction of immediate early genes and also indi- 
cated that diverse pathways could exert overlapping effects 
on the induction of these genes. However, there was some 
evidence for specificity, as restoring a rasgap binding site on 
the ^-platelet-derived growth factor receptor resulted in the 
induction of genes normally regulated by interferon-y, al- 
though the physiological significance of this was unclear. 
An additional set of experiments compared gene expression 
following the stimulation of NIH3T3 cells with either fi- 
broblast growth factor or platelet-derived growth factor. 
Again the same pattern of immediate early gene induction 
was detected. These observations suggested considerable 
overlap and cross-talk in the regulation of gene expression 
by different signaling pathways. However, subtle differ- 
ences may potentially have been missed as high, non-phys- 
iological concentrations of macrophage colony-stimulating 
factor were employed, as well as a relatively high 3-fold 
cut-off for the gene expression changes. 

Other studies have used microarray analysis to examine 
the downstream effectors of signaling pathways. Roberts et 
al [53] profiled gene expression in S. cerevisiae during the 
pheromone response of wild-type and mutants of the vari- 
ous mitogen-activated protein kinases involved in a number 
of different signal transduction pathways. They found sub- 
sets of co-expressed genes that reflected the activity, cross- 
talk, and overlap of the different signaling pathways regu- 
lated by specific mitogen-activated protein kinases, 
particularly two distinct mitogen-activated protein kinase 
mutants that revealed overlap between filamentous growth 
and mating responses. Guo et al [54] used c-myc null and 
wild-type rat fibroblasts to identify c-myc responsive genes. 
Myc was found to regulate genes involved in protein syn- 
thesis and metabolism, suggesting that its role may be to 
prepare the synthetic apparatus for the demands of prolif- 
eration. The effects of one of the immediate early genes, 
egrl, a regulator of transcription that is frequently overex- 
pressed in prostate tumors, were examined using an adeno- 
viral-mediated expression of egrl in a prostate cancer cell 
line [55]. The expression of a number of genes encoding 
proteins involved in neuroendocrine differentiation/as well 
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as several growth factors (platelet-derived growth factor-A, 
insulin-like growth factor-II, and transforming growth fac- 
tor 01), was increased following egr] expression. These 
observations suggested an early role for egrJ in prostate 
malignancies, although further validation is required. 

The studies described thus far in this section illustrate the 
potential power of gene expression profiling by microarrays 
and exemplify the concept of co-regulation of banks of 
functionally related genes. As mentioned, genes whose 
products have similar cellular functions, those that encode 
products which form large complexes and those which op- 
erate on the same biochemical pathway frequently exhibit 
co-regulation of expression. Observations of this type of 
relationship can also be used to generate testable hypotheses 
when previously uncharacterised genes cluster with genes 
of a specific function or biochemical pathway. In addition, 
microarray expression profiling has also revealed the poten- 
tial interaction between distinct signaling pathways, has 
identified potential regulatory sequence motifs upstream of 
gene promoters, and has provided a broader view of genome 
structure and function than would have been possible with 
conventional methodology. 

Among the first microarray studies directly relevant to 
cancer biology were those of Schena et al [56] and DeRisi 
et al [40]. Schena et al examined heat shock and phorbol 
ester treatment of a leukaemic T cell line using a 1065 
element array and found a number of changes that would be 
expected following those stimuli. DeRisi et al used a 1 161 
element array to examine the effects of reintroducing chro- 
mosome 6 into a melanoma cell line that lacked this chro- 
mosome. They detected significant changes in the expres- 
sion of 78 genes, of which 16 were re-analysed and 
corroborated by northern blotting. One of the first attempts 
to classify cancer cell lines using microarrays was reported 
by Khan et al [57]. They chose to examine alveolar rhab- 
domyosarcoma as those tumors are relatively uniform ge- 
netically; 70-80% have a translocation that creates a novel 
oncogenic transcription factor by fusing the pax3 or pax7 
gene with Jkhr. Also important was the fact that these 
tumors are aggressive soft tissue tumors that are frequently 
difficult to classify. A set of cell lines established from 
alveolar rhabdomyosarcoma tumors were profiled with an 
array of 1238 genes and compared to a reference sample 
from a diploid foetal myofibroblast cell line. The data were 
analysed using multidimensional scaling that clustered the 
alveolar rhabdomyosarcoma cell lines together when com- 
pared with E wings sarcoma, melanoma, prostate, and breast 
cancer cell lines. The genes associated with the alveolar 
rhabdomyosarcoma cluster showed a consistent pattern of 
expression of 37 genes, including those known to be dereg- 
ulated in alveolar rhabdomyosarcoma, e.g. pax3-flchr and 
cdk4. These observations were followed up with a subse- 
quent study transfecting pax3 and pax3-fkhr into NIH3T3 
cells [58]. Pax3 induced the expression of only a single gene 
and repressed the expression of three additional genes; in 
contrast, the oncogenic fusion protein induced the expression 



of a number of genes, including a high proportion of mus- 
cle-specific genes, some of which were confirmed by north- 
em blotting in several alveolar rhabdomyosarcoma cell lines. 

A smaller study of prostate cancer used nylon membrane 
arrays to compare hormone-sensitive and hormone-insensi- 
tive cell lines grown as xenograft tumors in mice [59]. 
Several genes were induced in the hormone-insensitive tu- 
mors and two, igfbpP2 and hsp27, were followed-up at the 
protein level by immunohistochemistry, using an array of 
tissue biopsies from 238 prostate tumors and 26 benign 
prostate tissues. Igfbp2 was expressed in all recurrent hor- 
mone refractory tumors, some primary tumors, and none of 
the benign tumors. Hsp27 had a similar pattern, but was less 
widely expressed. However, there was no statistical associ- 
ation between igfbp2 or hsp27 expression and tumor stage, 
nor did expression influence response to treatment. Another 
study used a similar strategy to identify a potential marker 
for anaplastic large-cell lymphoma [60]. Thirty-one hema- 
topoietic cell lines were examined with a 588-element nylon 
membrane array. The expression of one gene, clusterin, was 
found to be restricted to the anaplastic-large cell lymphoma 
cell lines. This was confirmed by western blotting and 
immunohistochemical analysis of 198 primary lymphoma 
biopsies representing most major lymphoma subtypes. With 
two exceptions, none of the non-anaplastic large cell lym- 
phomas expressed clusterin; in contrast, all 36 of the ana- 
plastic large cell lymphomas expressed clusterin. The au- 
thors concluded that although the function of clusterin in the 
disease is unknown, it may have potential as a diagnostic 
marker for anaplastic large cell lymphoma. Another study of 
two clonally related T cell lymphoma cell lines derived 
from the same patient at different stages of tumor progres- 
sion detected changes in genes involved in signal transduc- 
tion, transcription, adhesion, proliferation, and cell death 
[61]. Of particular interest was the observation that expres- 
sion of the gene encoding bleomycin hydrolase was in- 
creased with progression, despite the fact that the patient 
had not been exposed to bleomycin; this suggested that 
resistance genes are not always induced by exposure to 
chemotherapy, but can also be induced by tumor progression. 

In an attempt to understand the mechanisms of metasta- 
sis, Clark et al [62] profiled melanoma gene expression 
using a mouse xenograft model of tumor metastasis. Human 
and mouse cell lines were profiled using an array of 7070 
human and 6347 mouse genes. Three genes, fibronectin, 
rhoC, and thymosin $4 f were expressed at higher levels in 
the metastases of the two cell lines. Fibronectin is an ex- 
tracellular glycoprotein that is a ligand for the integrin 
family, rhoC is a GTPase that regulates cytoskeletal organi- 
sation in response to extracellular factors, and thymosin 04 
is an actin-sequestering protein that regulates actin poly- 
merisation. These genes have all been associated or corre- 
lated previously with tumor metastasis. Other genes that 
were up-regulated to a lesser extent included those whose 
products are associated with cytoskeletal organisation, such 
as or-catenin, a-actinin, and a-centractin, together with 



genes encodmg extracellular matrix components such as 
collagen a2 and «|. matrix Gla protein, fibromodulin an^ 
b.g lycan. Havmg identified pote „«i al candidates invoked in 
metastas.s, the authors investigated the role of rhoC exores 
s,on using a gene transfer strategy. Expression of rhTc 
atfvr r C hnr metaStaSeS, / hi,e eXprCSSi0n °^ dominant neg- 
important for tumor invasion. 

The observations of the cancer microarray studies out- 

tumortr. K St ? ,e th3t ft ^ * P° SS, ' ble ,0 c'assify 
Uimor cell lines by type using gen e expression profiled 

They also show that it is possible to associate partku ar 

genes with cancer types or biological processes such as 

metasUtas. Again, the results generated provid establ 

hypotheses for further investigation. The studies di cussed 

above were generally restricted to 500-8000 genes the ma 
jonty of which were characterised previous.'. Th'e use 0 
larger arrays, incorporating uncharacterised cDNAs or 
ESTs and coupled with the 'guil, by association' premise 
may a. ow the identification of novel genes associaS wS 
particular cancer types, those genes critical for tumorigT 
esis and metastasis, and also those genes involved Z 
development of resistance or failure to respond t0 £2*Z 

3.2. Drug discovery and development 

JT^V 11 ' identifi / a,ion of * ^ecific molecular target 
and the development of a series of small molecule inh bt 

c^v theT 83 ? 10 T firm ,h3t SUCh inhibi «™ *> ind ed 
act by he des.red mechanism of action on their intended 
target. In add.tion, the early identification of potent alon 
target and off-target effects, and also of pharTcSyn^c" 
markers of these effects, is highly beneficial for th/sZe 
quent prec.m.ca. and clinical studies required for the devel 
opment of anticancer agents [7,8,47]. The use of microar- 
rays can also generate a database that could alfow ,he 
me ch a nism of actI0 „ of a gjyen ^ ^ 

the changes ,n gene expression that they induce SuchT 
•abases also allow compounds likely £ act by a given 
mechanism to be identified for further study. The app lica- 
«-on of array technology i„ , h i s way has been \X e d 
" Umber ° f - *~ are discuss^ 

One study in particular is worth reviewing at some 
length, as ,t illustrates many of the important issues con- 
cemmg the use of microarray s in pharmacological mecL- 
msm of act.on work. As described earlier, ftf g^h 1n_ 
hibuory acfvity of approximately 70,000 compounds has 
been assessed using the US National Cancer Insthute's 
pane, 0 f 60 human tumor cell lines, and the e ffects"f le 
of these agents have been compared to the expression of 140 
proteins [10]. This strategy has been extended recem.y lo 
incorporate gene expression profiling by microarray [34 
63]. Ross e, a,. [34] profiled gene expression of the^umor 
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cell line panel using an array consisting of 9703 element 

Sd'SSS? 10 3700 previous,y 

1900 .dent.fied by sequence similarity to genes encoded hv 
o her organisms, and also 2400 unchLctensed eTt^ 
1 1 67 genes that showed the greatest variation and the 683 1 
genes that demonstrated a good, consistent hybrSisftion 
■gna. were subjected to hierarchical cluster analyl ^ 
arranged the cel. lines by their presumptive tissue oforjn 
fteonly exceptions being some non-small cell 1° Td 
breast cancer lines that were distributed i„ mu„, D le 
branches. Characteristic clusters of cenes diLT. ?! 
each cel. line type. For example , J^Z £X22 
by a 90-gene cluster that included eenes wh Jl .1 
were specific to melanoma l^^^EE" 
dopac hrome tautomerase, and the mar, J tumor smSeT 
■gen although rather paradoxically two breit ce^int 
shared expression of these melanoma genes NevlrZ ! 
was generally possible to find a .oSs^! oHhe 
tissue from which the cell lines origi na ted, imping th a he 

markers of proliferation, or were Sed in nro ^ 

for a relationship to compound sensitivity. The sene exnr^f 
s,on data were related to the cytotoxicity foUoS 7$Z 
exposure to 70,000 compounds from the Nat,3 Cancer 
ns«,tu«e collection. The array data were SS^J 
1376 genes that clustered the cell lines bv twJ„r 
initially, the dmg sensitivity data were dulte'd o St 
of 1 18 agents w It h relatively well-established m „u 
of action. The resulting clusters conSptd d ZteZZ 
modes of action and were divided into DNA and DNA^NA 
antimetabolites, tubulin inhibitors, DNA-daTging^ 
opo, some , inhibit0R . md topoisomerase iTlS ' 
5-Fluorouracil, an inhibitor of RNA and DNA Z n T 
clustered with the RNA synthesis in£ibh 0rs ZiZtZ 
m these cytotoxicity assays the main nSSETrfSS 
agent was inhibition of RNA synthesis 

The clustering pattern was altered when the 1 17* 
were analysed in relation to the actjy of the 1 1 Hn 
The antimetabolite and alkvl a »in„ 1 ! . dmgs - 
changed such that *££&*SL^ T 

c a SS es. The ant.fols, purine analogues, and pyrimidine an 
a ogues were separated into separate branches^ ZIZ he" 
^nZ agmtS SCParated int ° ^ reactive nitrog- m s 
cha„ge d , wnereas the topoisomerase l n jb ^ e w r re U a n r " 

amo 8 n d ,H 3 m ,T r ,h3t rCVealed mechanistic difTren Ce ; 
among the subclasses of compound. The topoisomerase 
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^ ^ WhC,her ° r they re « uired -Na- 
tion, and the topoisomerase II inhibitors senara.eH ;„t„ 

anthracycline node and another node that 221 m^T 

antrone and the bioreductive agents. This su^ZdZZ 

ab,ht y to produce double-strand DNA breaS S b an 

.mportant feature of the latter. Etoposide clustered w.th th" 

alkylatmg agents, implying , ha t drug metabolism as o„ 

method" meChaniSm ° f aCti °" 3,50 223 in z 
method of drug act.v.ty-gene expression correlation 

The same study also used a clustered image mat) to 
vsuahse the data and summarise the relationshfp between 
drug acuvty and gene expression. Two exampL 5 1 U 
orourac. and ^asparaginase, were cited. EighteTceil Les" 
-nc ud,„g all seven colon cell lines, showed^" "" v £ 
to 5-fluorouracl; of these, fourteen expressed Tow leve of 
mRNA encoding dihydropyrimidine dehydrogenise T„ en 
zyme that catabolises 5-fluorouracil. Some cellHnH^ 
acute lymphoblastic leukaemias, , ack ^r^g e ^2 
and requ,re exogenous L-asparagine. L-AsparaginTe de 
pletes extracellular L-asparagine and has been usedTn t 
treatment of acute lymphoblastic leukaemia. Com pari on of 
L-asparagmase sensitivity and asparagine synthase gene ex 
press.on revealed a moderate negative cLZf£ the" 
entire cel. line pane,; however, the leukaemic subpanel gave 
a very h.gh negat,ve correlation coefficient. The dl de 
scribed in the above study are available in « « data d 
database (http./dtp.nci.n^.gov/) ^^ffi^ 
number of d.fferen, parameters. For example, se chtg t he 
database for a relat.onship between epidermal growth fector 
receptor expression and increased cytotoxicity r^TZ 
top five hits identified by the study of Wos to J/ l/ 
crtcd earlier ([9]; Clarke PA, unpublished 

In some ways, the study of Scherf e, al. [63] was anal 

ZtlT T eSSi ° n Pr ° fi,ing S,udies in clin el. tmot 
that look for pred.ctors of classification or outcome How 
ever, there are several potential limitations assorted wl" 
th,s study: (a) the cell lines have been selected TlZ 
culture and should really be considered su"og a £ fZTs/tu 
tumors; (b) the database is generated from a f^l ZZZ 
of growth delay at 48 hr, which is a measure of sh„ rt t 

growth inhibjtion and/or cyt zr^ssr 

sh,p between drug activity and expression i coZZTZ 
not necessarily causal, as changes in gene «££2S 
lowmg drug exposure were not measured. Mo of 
drugs but not al, clustered by presumed mechanism of 

Z o t^TT ^ * dUC ,0 -P-'-enta. vars- 
ity or the loss of mformation as a result of compressing 60 
dimens.ons of drug activity across the cell lineslmo 
d,mension. In addition, only 20-3 0 o/„ of the po bI gent 
encoded by the human genome were examined; Us S 
We that clustenng may be further improved by the inclus on 
o a greater number of genes or potentially more ce. ,i nes 
Other effects such as mcorrec. or incomplete assignment of 
drug me chanism) influences of transponers/efflux pum ps 
metabohzing enzymes and other sensitivity/resLaTce 
genes, and secondary off-target effects of the^gent S 
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also influence clustering. It remains to be seen whether this 
tZZ effect ^ l ° PrCdiCt ^ - 
scn^H b r Cfit ° f estab,ishin S Abases, such as those de- 

^ datf?:;;: 8 t they a, '° w 

w.tn data for novel compounds with unknown molecular 

tl c xS?H This f r ility is already avai,ab,e 

otoxicity and specific molecular target exDre^inn 
•ng the COMPARE algorithm (^mSSI^Z 
. dU, 0 n of globa) gene expression datr s houta r ?n 
hance the power of this informatics tool. However Ae next 
major goal will be the establishment of a 25,^ 
press.on profile changes that occur in respond freal^ 
w.th anticancer agents or following dru f£l mSuS, 
by genetic means (see following m0du ' at,0n 
As described earlier, the entire gene comolement „f c 
cerevisiae has been established, and its 222hS I 

dent kmase mh.bitors. Compounds based on a SstS 
punne demonstrated some selectivity towards cdST v 
complexes and inhibited the activity^ T^t^i?!" 
dependent kinases, cdc28 P and pho85p. Tne gTe exntT 
s.on response of yeast cells following exposure !o an P !" 
>ng cyclin-dependent kinase in„* •? a ^> an exist- 
compound 52 7a tJL^Tj^ 
nase inhibitor), and 52Me, an LJ^ut^lt 

spectively. Of the genes that changed, 63 were altered L 

•"creased expression of metabolic ^£Jl? T' 
products were associated with transnS cT n ^ 

mampulatmg cyclin-dependent kinase activity were ako 
tested by profiling the expression of a cdcZ t^ZL 

•n these experiments was important as fe^ rZ 8 
detected suggesting that 

sssssssi 
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kinase inhibition. Although there were some similarities in 
the gene expression profile between small molecule inhibi- 
tion and genetic inhibition, the differences between the two 
strategies suggest that the small molecules may have addi- 
tional off-target effects or that there may be an intrinsic 
methodological difference between chemical and genetic 
manipulation of a target. Dose-response effects may also be 
involved. 

A comparison between manipulating a target by either 
genetic or small molecule approaches was also carried out 
by Marton et ai [65], who used various deletion mutants of 
S. cerevisiae to compare the effects of specific mutations 
with those of known inhibitors of the same pathway. The 
calcineurin-signaling pathway, which can be inhibited by 
FK506 or cyclosporin A, and inhibition of his3 by 3-amin- 
otriazole, were used as model pathways. The gene expres- 
sion changes induced by the inhibitors were examined in 
wild-type yeast and compared with the effects seen in mu- 
tant isogenic strains lacking the calcineurin catalytic sub- 
units or his3. In both cases, there was a correlation between 
drug inhibition of the target and inactivation of the target by 
mutation. A more sophisticated approach using a mecha- 
nism 'decoder' strategy could also be applied. Initially, the 
expression profile of drug-treated wild-type cells is com- 
pared with the expression profile of a panel of mutant 
strains. The mutant strains with the most similar expression 
profile are then selected and treated with the drug. For a 
perfect drug with absolute specificity, treatment of a mutant 
strain that lacks the drug target should not alter the gene 
expression profile. However, in reality no drug shows per- 
fect target specificity, so any changes in expression profile 
detected in the drug-treated mutant will give clues to off- 
target effects. Treatment with FK506 gave a similar profile 
to the calcineurin mutants; however, subsequent treatment 
of these mutant strains with FK506 altered gene expression 
in a manner consistent with an off-target effect. This profile 
corresponded to an effect dependent on the gcn4 transcrip- 
tional activator. The expression of these genes was un- 
changed following FK506 treatment of gcn4-n\x\\ cells and 
confirmed the requirement for gcn4. An additional subset of 
genes encoding drug efflux pumps, similar to the multidrug 
resistance family of proteins, were still induced in the gcn4- 
null cells, suggesting a secondary off-target effect. 

Hughes et ai [30] extended this approach using a 'com- 
pendium* of expression profiles to demonstrate that the gene 
expression profile of a mutation successfully serves as a 
molecular phenotype that corresponds, in turn, to a pheno- 
type defined by conventional biochemical assays. They 
achieved this by creating a reference database of expression 
profiles from 300 full genome expression profile experi- 
ments using S. cerevisiae mutated in 276 ORFs and main- 
tained in a single growth condition. Although not every 
mutation gave an identifiable phenotype in this single 
growth condition, the expression of at least one gene, other 
than the deleted gene, changed by more than 2-fold. As 
described earlier, a large number of parallel negative con- 



trols of untreated isogenic yeast cultures were also profil d 
to ascertain whether particular transcripts had an inherent 
fluctuation that exceeded that for other genes. In each one of 
these negative control experiments, at least one gene 
changed by more than 2-fold. Hierarchical clustering of this 
dataset identified genes that were regulated by nutritional 
status or stress. These genes displayed small magnitude, 
coordinated differences between seemingly identical control 
cultures, and were thought to represent biological noise 
within the culture system. The data generated were then 
used to generate an error model that could provide biolog- 
ical correction. In some mutants with no apparent growth 
defect, these fluctuations accounted for virtually all of the 
more than 2-fold changes; however, of those mutants that 
lacked a growth defect, a third had at least five genes whose 
expression was altered significantly when the gene error 
model correction factor was applied. Of those mutants that 
affected growth, 90% showed significant changes in the 
expression of at least five genes. Thus, the single growth 
condition evoked a response in about half of the mutants 
studied. 

Several classes of co-regulated genes were identified, 
e.g. those whose products were involved in ergosterol bio^ 
synthesis, mitochondrial respiration, protein kinase C/cal- 
cineurin signaling, amino acid biosynthesis, DNA damage/S 
phase arrest, and mating [30]. A general observation was 
that specific mutations altered the expression of genes 
whose products were involved in the same cellular process 
that was affected by the mutation; also, different mutants 
that affected the same pathway frequently exhibited similar 
gene expression profiles. Additionally, cells treated with a 
relevant small molecule inhibitor showed a gene expression 
profile similar to that resulting from mutation of the target; 
for example, inhibition of HMG-CoA reductase by I ova- 
statin gave a similar profile to an hmg2 mutant. This im- 
portant study also demonstrated that the function of some 
unknown ORFs could be predicted by cluster analysis, and 
several predictions were confirmed subsequently by bio- 
chemical analysis. One example was the clustering pattern 
of an ORF, which suggested a role in sterol biosynthesis; 
this was confirmed biochemically and by complementation 
with its human homologue. Another example was revealed 
by exposure to dycyclone, an anesthetic, the gene expres- 
sion profile of which resembled those of mutations that 
affected the ergosterol pathway. Biochemical analysis con- 
firmed that this pathway was inhibited by dycyclone. One 
mutant, erg2, was hypersensitive to the drug, while overex- 
pression of ergl resulted in decreased drug sensitivity. In 
contrast to erg2, other mutants of this pathway were unaf- 
fected by dycyclone. Biochemical analysis demonstrated 
that both erg2 mutants and dycyclone-treated cells accumu- 
late the same intermediates of this biosynthetic pathway. 
The human homologue of erg2 is the sigma receptor, a 
neurosteroid-interacting protein that regulates K + conduc- 
tance and binds several neuroactive drugs, such as haloper- 
idol. Erg2 was also inhibited by haloperidol, an observation 
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consistent with the sigma receptor and erg2 being related 
gene products. 

There were also many genes whose expression varied by 
less than 2-fold. In expression profiling experiments, 
changes of this magnitude are largely ignored as they are 
considered unreliable. However, it has been widely recog- 
nised that changes of this degree may still be important, and 
there is a requirement for protocols to identify noise and 
bias and to assess whether these small but potentially crit- 
ical changes are biologically significant. Hughes et ai. [30] 
used their error model to apply a mask to the data, and this 
facilitated the reliable identification of genes whose expres- 
sion varied by 1.5-fold. This strategy successfully assigned 
eight previously uncharacterised ORFs to four pathways, a 
result that would have been unlikely had conventional cut- 
off parameters been employed. 

Although by no means fully comprehensive, the yeast 
study has been discussed in detail because it has established 
the principle of the method. A crude estimate anticipates 
that a total of 300-700 distinct, full genome transcription 
patterns would be obtained from the complete set of gene 
mutations in yeast, when profiling under a single condition 
[30]. However, one major limitation is the use of a single 
growth condition, as only half the mutations give a re- 
sponse. Therefore, it may be necessary to look at the re- 
maining mutations under more restrictive conditions when 
establishing a pane) of conditions and mutations. 

Application of this approach to mammalian cells will 
present additional challenges. One major challenge will be 
the production of 'targetless' cells that are either permanent 
knockouts or alternatively conditional knockouts that can be 
switched on and off. Possible strategies could include the 
use of antisense gene or oligonucleotide technology, the 
expression of dominant-negative inhibitors, or alternatively 
the generation of gene knockouts by homologous recombi- 
nation. In addition, gene expression pattern is dependent 
upon the tissue of origin [34], with the additional compli- 
cation that there can be considerable variation between cell 
lines from the same originating tissue. Screening the growth 
inhibitory activity of anticancer agents against the National 
Cancer Institute panel of 60 cell lines has already demon- 
strated that cell lines derived from the same class of cancer 
or tissue type can exhibit a wide range of drug sensitivities 
[63]. Similarly, there are also cell-line dependent differ- 
ences in the changes in global gene expression profile in 
response to a given drug. For example, microarray studies in 
our laboratory have shown that treatment of four different 
human colon cancer cell lines with an hsp90 molecular 
chaperone inhibitor results in differing gene expression pro- 
files in response to treatment [66]. Thus, care must be taken 
when choosing a cell line(s) for profiling experiments. 

Despite the potential complications of examining drug 
action in mammalian cells, a number of simple studies have 
investigated the utility of global gene expression profiling 
following exposure of cancer cell lines to individual agents. 
The response to DNA damage has been assessed following 



exposure of human myeloid cells to methanosulfonate, 7 
irradiation, or UV irradiation [67]. Genes expected to be 
induced by these treatments, such as those regulated by p53, 
were detected; in addition, the expression of several other 
genes previously unassociated with DNA damage has also 
been shown to be responsive to genotoxic stress. Also of 
significance was the observation that gene expression 
changes in response to the DNA-damaging treatments var- 
ied widely between cell types, indicating that, as discussed 
above, cellular context can play an important role in thera- 
peutic response. Other studies have also examined the ef- 
fects on gene expression profile of agents that either directly 
or indirectly result in DNA damage. Kudoh et ai. [68] 
compared doxorubicin-sensitive and -resistant forms of the 
MCF-7 breast cancer cell lines. Many genes were altered 
following exposure of the sensitive line to doxorubicin; in 
contrast, the resistant line exhibited fewer changes, although 
the genes that did change were also altered in the sensitive 
line. These included the induction of epoxide hydrolase, a 
drug-metabolising enzyme frequently overexpressed in 
breast and other tumors. Analysis of two fibrosarcoma cell 
lines that were resistant to a diverse range of DNA-inter- 
acting agents demonstrated altered expression of genes in- 
volved in DNA repair and replication, signal transduction, 
cell cycle control, and transcription [69]. 

Studies of agents other than DNA-interactive drugs are 
now appearing in the literature. One study investigated the 
effects of a non-specific protein kinase inhibitor, staurospor- 
ine, on interleukin-3-dependent murine pro-B cells, and 
compared the gene expression pattern to that following the 
induction of apoptosis subsequent to interleukin-3 depriva- 
tion [70]. A number of genes were altered following death 
induced by both stimuli. Interestingly, the apoptotic stimuli 
influenced a number of genes previously unlinked to cell 
death pathways. For example, staurosporine treatment 
caused the induction of genes involved in inflammation. 

In our own laboratory, we have used microarrays to 
examine alterations in gene expression pattern following 
exposure to the novel agent 17AAG [66]. This is the first 
hsp90 molecular chaperone inhibitor to enter clinical trial 
and shows considerable potential as an anticancer agent 
because of its ability to reduce the cellular levels of several 
important oncogenic hsp90 client proteins [66,71,72]. The 
expression profile of a number of human colon adenocarci- 
noma cell lines was obtained following exposure to 
17AAG. The response of each cell line to this agent varied 
widely at the gene expression level and also at the protein 
level, again indicating that cellular context has an important 
role in response to anticancer agents. A number of interest- 
ing changes were detected. These included induction of 
hsp90, the molecular target of the drug, in cell lines that had 
reduced sensitivity to 17AAG, contrasting with low hsp90 
expression in cell lines that are particularly sensitive to 
17AAG (Fig. 4). Other gene expression changes, such as 
those in cytoskeletal and signaling genes, also showed con- 
siderable variation between different cell lines. Of particular 
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Fig. 4. Demonstration of the use of micmarra* . . 
effects of the hsp90 inhibitor .7^™™^^ ,0 « he 
™s figure demonstrates the biolog^, ~ 
colon cancer cells and shows the diffcren , m tw ° """"an 

hsp9O0 a, the RNA and protein leteT The 1„ 8 'I? hX/bSp7 ° and 
processed array data from HT29 and HCTU6 El ™ a " d 

noma cells treated with 0.5 and I JmT t ,7 "J™ aden °«™- 
respectively. The four columns on theleft Lh ? "i hr) ,MAG - 
rimage data following . 24 -hr treatment wJh ma^ "J? "* Ph ° Sph °- 
coloured columns on the right-hand srt! Thl I ? *"* "* four P*"**- 
treatment disced ^ S^V^*"-"'- 
each «,me point. Green = increased by I 7AAG It men ' ^ 

unchanged, red = decreased by l7AAGtJJ™ ? k ' reatniem ' ^llow = 
stable. (B) Western blotting of c-raM hsc^ D ,0 h ZT^ = Unde " 
expression following 1 7A AG treatment ^ p70 -,^ and GAPDH 
is depleted a, 24 h, bu, has red Z 4 8 hMn^ 'T ™« 
c-raf-l is depI e,ed a, both 24 and 48 h r HscL„70 J 

•he more sensitive HT29 cel. line M^S^lZTr 1 " " * p,e,ed in 
with P^'^£ 1 ^ =r 

—y ( ^ and hsc70) >,„ ~ ^one gene 
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2!^i tT^te wa r onfimied - *• 

biood ^mpi^^sr^'n: p 

combination of microarray and wester bTott "T 8 3 
we defined a molecular s.4«l^^ , ? tt '" , ¥ anal >' sis > 
sisting of hs P 70 induction f n T, p9 ° inh,b 'tion, con- 
signature is ^ used °o p ^iT n T dn ? P,et, '° n - Th ' S 
out of hs P 90 inhfbition n Se".^^^^^ rCad - 
tumor biopsies in our o!3^^ "T^ 0 ^ and 

by microarray analysis S ° be dete ™ned 

eornpa™ inaciveano ^ZZZ^fZ 0 * 

These changes were seen wiu, both 17AAG »nrf ? °"' 
»d *ns were independent of ehenSJT^f, ^,"» 1 "='»' 

additiona chances in 0f > n ~ * "iypc ii-ig. 5), however, 

PA a^ Workman P, anpnb'iS^ 2 ^ 
formation s extremelv nc^f..i a • UDServall0n s). Such in- 
opment, as i, ^^^ST^ ^ deVe '- 
•o which chemical ^JS^?i£LT te,,, ? ChemiB, » "» 
and, hence, therapeutic sSec^ity Y * ^ 

Although currently there is little ,*„fx^ *• . 
erature, another ^L^Z^lfT^ the 
ticularly f or pharmace..tir- a i hlS a PP r °ach, par- 

^of^~^^^ of P the : 
w.H require a «»n i p^ ve J J™ l, 5 0 ^l^ 
pounds with known nh,™, i 7 refere nce com- 
P^o P erties.Onceth^isLt^^ g,Ca, ^ toxic °'°gical 
be compared "fl*l^^'' ,,B * ? neW COm P oun < ^ou.d 
mechanism-based totSJy Pred,Ct COm P°^-related or 

egy using ^c^^XtS^Tt 

with unknown 2c an Tsl Xtn"^,^ ° f 
Pharmacodynamic or Vn^^^^.. ^ 
between active and inactive anlio markers ' ^ c > d 'sHnguish 
relationship stud£-^^!? 0 ? ,e, I ,n struct ^activity 
volved in dwg^S^ip* C,UeS about genes in- 
Plication of Ji^SZSLS^Z « a"' t 3P ' 
caveats that will need to be addrf sed lZ Z ° f 
cut-off for gene exoressinn ,h , rn'mmum 2-fold 

•Mature k \ 

changes below this threshold ,ul u more subtle 

Portan, being ZZZ^T^^^ *>■ 
relevant in view of freauent^/ y be es P ec 'a»y 

that test:reference rS fo r r re f°^ which suggest 
arrays are g enTa,iriowIrth a „ P r ,0n ^ ^ fro »" 
conventional W^^J^*^ from ™* 
PCR Anntk u ^ncs, such as northern b bttine or RT 
PCR. Another completion is the ability to diftmguilh 
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Fig. 5. Validation at the protein level of hsp70 as a eene a „rf „k 
dynam.c marker responsive to inhibition o ? hsp90 £ZT P ^ MC °" 
A2780 ovarian adenocarcinoma cel.s were teaZ Zrt I 7 1MAG - 
I7AAG and an inactive analogue of 1 7AAC I e<)U,molar do ^ °f 
dose of radicico.. a S tr U c,ura.f y dissi^S ^ " 
blots show tha, the two active compound depfcted c r M ! 
hs P 70. while the inactive 1 7A AG derivative did not H 7 ^ ' ndUCed 
no. induce hsp70 significantly. The tad™ « £ 71 V""* 
derived from hierarchical clustering of mi^rl-d n T ' ^ iS 
data from a parallel experiment. This de™n7t™es rta.T' e . XpreSSion 
dissimilar bu, active hsp90 inhibitors (17AAC 1h nl ""^ 

;^r ss " profi,es ,han L ~~ & 



between primary effects on the drug target and thr,« ,u . 
arise as a downstream consequence ii^S^^ 
that follow inh.bition of the target activity For exlZ ! 
cyclin-dependent kinase inhibitor could direct vTnfl.f' 2 
inscription factor activity and gene expre Jo ' 
downstream b.ologica. effects of cyclin dependem kinase 
.nh,b.ti 0n may mclude cell cycle arrest, an evem that wTl! 
also change the expression of ce.l cycle-reguLTed genes o 
apoptosis, which will also lead to gene exnr«! t 
The establishment of compreherJv ^ ST^SS 
b.ologica. outcomes, should eventual.y all ^ 
gu.sh between the gene expression changes that are e thT 
-mediately dependent upon, or ^ernl^Zj^Z 
of, targe, mh.bi.ion. Another issue in the use of mlroa™ 
dunng drug development relates to the obse'atbnTha 
gene express.on varies widely between cell types an d in 
most cases these differences are dominant ove 'chang 
induced by drug treatment. Therefore, choice of cell tS 
an JS sue, and the use of a standard cell line panel ,Z h 
essential. ,n many experiments, i, is dear ma'uhe 
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easy to sample for tumor biopsy. Two key studies, discussed 
in detail below, have taken advantage of this and have tested 
a number of hypotheses concerning gene expression profil- 
ing and the molecular classification of cancer. 

Although the distinction between the acute leukaemias 
(ALL and AML) is well established, there is no single test 
to distinguish between ALL and AML. Classification still 
requires interpretation of morphology, histochemistry, im- 
munophenotyping, and cytogenetics. Distinguishing be- 
tween these two leukaemias is critical as their treatment 
regimens are different. Moreover, despite improved therapy 
and better response rates, patients that relapse have a poor 
outlook, and mortality from treatment is still a factor. Golub 
et ai [76] set out to determine whether gene expression 
profiling by microarray could be used to classify leukaemia 
samples. The gene expression profile of 6817 genes was 
measured in bone marrow samples from 27 ALL and 1 1 
AML patients using a high-density oligonucleotide array. 
The data were initially analysed by neighborhood analysis, 
an approach that defines the idealised expression pattern 
corresponding to a gene that is uniformly high in one con- 
dition and low in another. The dataset was then tested to find 
whether there was a high density of other genes nearby, as 
compared to equivalent random patterns. For the samples 
analysed, approximately 1 100 genes were better correlated 
with the AML-ALL class distinction than would be ex- 
pected to happen by chance alone. The known samples were 
then analysed to create a class predictor, capable of assign- 
ing a new sample to one of the two classes. A procedure was 
developed with a set number of informative genes that could 
potentially distinguish between classes, so that the expres- 
sion of these genes in a new sample of unknown pathology 
could be used to predict its classification. In effect, each 
informative gene casts a weighted vote in favor of one or 
another classification, and the sum of these votes is used to 
set a threshold. The informative genes included: cell surface 
markers for which known antibodies are used; genes critical 
for S-phase progression; chromatin remodeling genes; cell 
adhesion genes; and some known oncogenes. Of the 38 
samples examined, the 50 genes that correlated most closely 
with the AML-ALL distinction could successfully predict 
the classification of 36/38 samples. In the next stage, 34 
unknown samples from peripheral blood and bone marrow 
were examined from several different reference laborato- 
ries. The gene predictor successfully identified 29 of 34 
samples with high prediction scores. Interestingly, one lab- 
oratory used a different sample preparation protocol that 
resulted in lower predictive strengths, implying that standardi- 
sation of protocols will be essential for studies of this type. 

Following this initial success, the approach was then 
used to examine response and outcome following treatment 
with an anthracycline plus cytosine arabinoside regimen. 
Fifteen AML patients with long-term follow-up were exam- 
ined. However, neighborhood analysis could not predict 
response, and no strong multigene signatures that correlated 
with outcome were identified. Explanations for the failure to 



predict outcome could be that potentially informative tran- 
sient changes in gene expression profile were missed, as 
only pre-treatment samples and not post-treatment were 
examined, or that the relapsing clone had grown out from a 
single or a small group of cells, the expression profile of 
which was masked by the bulk of the leukaemic cells. 

Another, perhaps more important, question was also 
asked: Could gene expression profiling alone be used to 
classify and distinguish between ALL and AML without 
prior knowledge of the pathology? Two steps were neces- 
sary: the first was to determine algorithms to cluster the data 
by gene expression and the second was to determine 
whether the classes identified by clustering are real rather 
than a result of random aggregation or factors such as 
differences in sample isolation, storage, or preparation. 
Self-organising maps were used to establish the optimal set 
of centres around which to cluster the data. The total dataset 
was then clustered to the nearest centre. Employing a two- 
centre model, 24/25 ALL and 10/13 AML cases were iden- 
tified correctly. Using a 20-gene predictor based on these 
data, 34/38 samples were assigned correctly, with one error 
and three unassignable samples. With a four-centre model 
the samples were subdivided into AML, T cell AML, and 
two different B cell ALL classes; the significance of the two 
B cell ALL classes was unknown, but these could represent 
different transformation mechanisms. 

Finally, this study also presented an interesting anecdotal 
case of a patient with a leukaemic presentation that was 
diagnosed as AML with atypical morphology. The gene 
expression profile of a marrow sample from this patient was 
analysed using the class predictor. This gave a low predic- 
tion score for both AML and ALL, as neither lymphoid- nor 
myeloid-specific genes were expressed. In fact, the gene 
expression pattern detected was more typical of mesen- 
chynal tissue, such as muscle. Cytogenetics subsequently 
identified a t2:13(q35;q!4) translocation characteristic of 
alveolar rhabdomyosarcoma, a muscle tumor. 

In the second study, Alizadeh et ai [77] asked whether 
expression profiling could be used to generate a molecular 
portrait of distinct types of B cell malignancy and whether 
distinct B cell malignancies not recognised by current clas- 
sification systems could be identified. Despite the clinical, 
morphological, and molecular parameters used to classify 
lymphoma, patients with similar diagnoses can experience a 
very different response to treatment. Lymphoma classifica- 
tion has steadily evolved; however, it was significant that 
recent schemes such as the Revised European- American 
Lymphoma Classification scheme unified various morpho- 
logical subtypes into single groups despite the suspicion that 
they 'include more than one disease entity' [77,78]. Each of 
the currently recognised categories of B cell malignancy can 
be traced to a particular stage of B cell differentiation, 
although the extent to which this relationship is maintained 
in the malignant cell is unclear. DLCL cells have hypermu- 
tated immunoglobulin genes, implying that they arise from 
germinal centre B cells or a later stage of differentiation. 
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samples clustered with normal breast tissue, an observation 
consistent with the fact that these tumors had responded to 
treatment and there was no tumor in the sample. 

The largest gene cluster was a proliferation cluster that 
correlated with common immunohistochemical markers of 
proliferation and increased mitotic index. Other clusters 
included a large group of interferon-regulated genes, a 
c-er6-Z?2-related cluster of genes located to the region of 
chromosome 17 which is frequently amplified, and a cluster 
containing genes, including c-fos and junB, that were in- 
duced by prolonged handling of the sample. In addition, 
there were clusters of genes consistent with different cell 
types that may also be present in the tumor: these included 
endothelial cells, stromal cells, adipose-enriched/normal 
breast cells, B cells, T cells, and macrophages. The analysis 
also identified four tumor subgroups with expression pro- 
files characteristic of estrogen receptor positive/I uminal ep- 
ithelial cells, basal-like epithelial cells, erb-B2 positive 
cells, and normal breast epithelial cells. These again implied 
the presence of diseases within disease. 

An additional breast cancer study has compared sporadic 
breast tumors to hereditary BRCA-1- and 5tfG4-2-related 
tumors [79], Multidimensional scaling successfully sepa- 
rated the three types of tumor; of 3226 genes that were 
analysed, 51 were found to best differentiate between these 
different tumor types. Examination of the data suggested 
that BRCA- /-related tumors differed significantly from 
#/?C4-2-related and sporadic tumors, exhibiting transcrip- 
tional activation of pathways involved in apoptosis and 
DNA repair. The BR CA -2- related and sporadic tumors ex- 
hibited similar profiles. Interestingly, one patient appeared 
to have been misclassified into the BRCA-1 group, but had 
no discernible mutation. Subsequent analysis of the pro- 
moter region revealed aberrant methylation of the BRCA-I 
promoter regions. Hypermethylation of the BRCA-1 pro- 
moter is known to silence BRCA-1 expression, and the 
observation was corroborated when BRCA-1 expression was 
found to be low in this patient. These observations demon- 
strated that expression profiling could be used to character- 
ise BRCA-1 and BRCA-2 driven tumors and also demon- 
strated that they are molecularly distinct. 

A similar approach has been used in ovarian cancer, 
comparing normal and malignant ovarian tissue samples. 
Welsh et al [80] used high-density oligonucleotide arrays 
to profile and classify 27 serous ovarian adenocarcinomas. 
The tumors could be split into several groups. One group 
clustered with normal ovarian tissue, overexpressed high 
levels of the ribosomal genes, and contained tumors that 
were generally well -differentiated histologically. Another 
group clustered with the ovarian cell lines, were poorly 
differentiated, and underexpressed genes involved in metab- 
olism. A third group exhibited expression patterns consistent 
with the presence of stroma and activated B cells. Exclusion of 
the last group of biopsies resulted in a list of genes enriched for 
potential markers of epithelial ovarian malignancy. A separate 
study compared the gene expression profiles of serous and 



mucinous ovarian cancer biopsies [81]. Increased expres- 
sion of a number of genes common to both types of tumor 
was detected; in addition, expression patterns specific to 
serous and mucinous tumors were also detected. Both of 
these ovarian studies identified potential tumor-specific 
markers that will require extensive further validation. 

To try and avoid the problem of contamination by nor- 
mal cell types and yet still obtain sufficient material for 
analysis, Bittner et al [82] profiled expression in melanoma 
tumor biopsies that were passaged in culture for 2-52 weeks 
(median 8 weeks passage) prior to analysis. Five samples 
were analysed straight from biopsy, and one sample was 
analysed directly from biopsy with a portion also cultured 
for 3 passages prior to profiling. Expression was compared 
to a standard cell line reference using an array of 8150 
elements, corresponding to 6971 unique genes. The data 
were analysed by hierarchical clustering using a matrix of 
Pearson correlation coefficients, and results were plotted 
using multidimensional scaling. A cluster containing 19 of 
the melanoma samples was identified, although no clinical 
or tumor cell characteristics were specifically associated 
with this cluster. Different samples taken from the same 
patient were the most closely related, even, in one case, 
when sampled a year apart. The patient population had 
uniformly poor prognosis, and outcome data were available 
for some patients in this study; ten were in the identified 
melanoma cluster and of these four died, whereas all four 
patients that were outside this cluster died. However, this : 
difference was not significant. The extent to which melano- a 
mas can be subclassified using gene expression microarrays < 
remains to be elucidated, but the fact that this study could 
group the tumors by molecular markers suggests that such 
an approach does have promise. 

A number of studies have addressed the problem of 
heterogeneity in solid tumors using the alternative approach 
of laser capture microdissection. Sgroi et al [35] isolated 
approximately 100,000 normal, invasive, and metastatic cells 
using laser capture microdissection of a single breast tumor. 
Rather than using amplification protocols that may be prone to 
representative biases, total RNA was extracted and radiola- 
beled by reverse transcription primed using oligo (dT) and then 
further radiolabeling by second strand cDNA synthesis. This 
approach detected a number of tumor-specific genes, and the 
expression of several of these was subsequently confirmed 
by RT-PCR. While this study was restricted to a single 
tumor sample, it demonstrates that microdissection approaches 
can be feasible and potentially useful. Another study of several 
squamous cell carcinomas of the head and neck used a similar 
approach to identify genes showing altered expression that 
are involved in cell cycle regulation, signal transduction, 
angiogenesis, and cell death regulation [36]. 

It is clear from these studies in various tumor types that 
gene expression profiling of clinical biopsy samples by 
microarrays can be used to classify tumors and to indicate 
the presence of previously unidentified molecular subtypes. 
This approach may also provide information on the under- 
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4. Conclusions and future directions 

4.1. Technological advances in the hardware 

The microarray field is evolving ranidlv »„a 
nologies, methods, and application^ 
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allowed simultaneous processing of all 49 samples in under 
1 week; this also had the advantage that all hybridisations 
were subject to identical conditions. Currently, the detection 
limit of arrays using an amplification step is of the order of 
five mRNA copies/cell. The development of better labeling 
and detection systems will also be required to improve 
sensitivity; in addition this will also reduce the amount of 
input RNA required for array experiments and would ide- 
ally obviate the need for an amplification step prior to 
labeling. Examples of new technologies in this area include 
the development of infrared excitable phosphor particles 
that exhibit a phenomenon known as up-conversion when 
illuminated, a process that has been reported to increase 
sensitivity over conventionally used fluors [88]. Alternative 
approaches include electronic DNA detection using capture 
probes attached via conducting 'molecular wire'. Hybridi- 
sation of cDNA labeled with ferrocene, which can transfer 
electrons, is detected by applying a voltage to this system 
[89]. Microtransponders consisting of a 0.0125 nm 3 silicon 
photocell/radio transmitter attached to a nucleic acid probe 
are also being developed; following hybridisation the pho- 
tocell is activated by laser excitation of a fluorescently 
labeled cDNA and the resulting radio signal detected [89]. 
These approaches could be coupled with miniaturisation 
and microfluidics (which reduce sample and reagent con- 
sumption) and also the use of flow cells. Such integration of 
the many different features of array technology could pro- 
vide a single, simple-to-use piece of equipment [reviewed in 
Ref. 89]. 

4.2. Developments in software and bioinformatics 

Until recently, efforts have concentrated on developing 
array hardware. More recently, attention has begun to focus 
on what to do with the data and how to analyse and present 
it. Although considerable progress has been made, further 
major advances in data analysis and visualisation are still 
required. An example is the use of systems such as neural 
networks, which can be trained with established data and 
then applied to examine unknown systems. There are cur- 
rently a limited number of publicly available tools for data 
storage, processing, retrieval, and integration of microarray 
data in the context of existing knowledge. Efforts are now 
underway to establish public databases of gene expression 
profile results (e.g. see the gene expression omnibus at 
http://www.ncbi.nlm.nih.gov/geo/or ArrayExpress at http:// 
www.ebi.ac.uk/arrayexpress/). Although there have been at- 
tempts to standardise data and experimental format (see 
http://www.mged.org/), there is still no consensus on how to 
merge data from different array types (such as oligonucle- 
otide versus cDNA, glass versus nylon, radioactive versus 
fluorescently labeled) or how to include data from alterna- 
tive gene expression profiling approaches, such as differen- 
tial display and serial analysis of gene expression. Other 
factors that also have to be considered and improved include 
the choice of reference RNAs and the standardisation of 



experimental design. In addition, not only will the sheer 
abundance of data generated by expression profiling require 
better methods for data handling, but this will also require 
us to rethink how we interpret the data and generate hy- 
potheses. Interpreting the biological significance of a gene 
cluster still presents a formidable challenge; however, a 
recent study has demonstrated methodologies for associat- 
ing microarray data with the literature so that expression 
patterns can be more rapidly understood [90], 

Early gene expression profiling studies have come from 
an observational or at least a less hypothesis-driven ap- 
proach. The inferences that are drawn from expression pro- 
filing data are not the endpoint, as they require validation 
and further evaluation using biochemical approaches. Al- 
though it is conceivable that gene expression microarrays 
could be developed in such a way as to become routine 
analytical methods in their own right, at present they are 
mainly used as a screening tool with which to generate 
hypotheses. The need for validation of a high volume of 
findings will no doubt refocus efforts on the development or 
improvement of higher throughput cell-based assays that 
can assess multiple biochemical parameters. An interesting 
recent example was reported by Ideker et al [91] who 
combined microarrays with quantitative proteomics and da- 
tabases of known physical interactions to build a biochem- 
ical pathway, test the hypothesis, and refine the model 
generated. 

4.3. Practical impact 

It is clear that gene expression microarrays are already 
having a major impact on cancer biology, pharmacology, 
and drug development. As reviewed here, there are many 
examples of this in the literature, and there will be an 
extensive body of unpublished information in the pharma- 
ceutical and biotechnology industries that is not available 
for proprietary reasons. The major limiting factor in the 
further application of microarrays is probably cost and ac- 
cess to the technology. Costs are likely to decrease and 
access will grow as a result of the expanding provision of 
core facilities and the increasing friendliness of the technol- 
ogy. Microarrays will continue to make a major contribu- 
tion to the progressively complete molecular description 
and understanding of the biology of normal and cancer cells. 
Our comprehension of the changes in gene structure and 
expression levels during cancer progression will become 
increasingly thorough over the next few years, so that un- 
derstanding the proteome, rather than the transcriptome, 
will become critical. An intermediate approach could be to 
analyse mRNAs being actively translated by isolating 
mRNA associated with the ribosoma! apparatus [reviewed 
in Ref. 92]. 

Gene expression arrays have a key role to play in all 
phases of drug discovery and development (Fig. 1). This 
includes the identification and validation of new targets, the 
profiling of on-target and off-target effects during the' op- 
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timisation of new therapeutic agents, understanding molec- 
ular mechanisms of action and structure-activity relation- 
ships and the prediction of side-effects, and the discovery of 
diagnostic, prognostic, and pharmacodynamic biomarkers. 
Microarrays will be used increasingly during early clinical 
development to confirm that the desired mode of action is 
occurring and to profile the molecular factors responsible 
for drug sensitivity and toxicity. In addition, microarrays 
will have an important role to play in the molecular eluci- 
dation of drug resistance [93,94]. As the role of pharma- 
cogenomics becomes more extensive, it is not yet clear to 
what extent microarrays will be employed in the clinic, for 
example, in the diagnosis, selection, and monitoring of 
patients during cancer therapy. Will microarrays be used 
routinely or will they predominantly be employed to supply 
diagnostic, prognostic, and biomarker endpoints for more 
conventional analyses such as immunohistochemistry and 
ELISA assays? Most likely some combination of these will 
be used. 

Over the next 5-10 years we will have an increasing 
number of new molecular therapeutics targeted to the major 
abnormalities that are responsible for cancer progression. 
Current examples include Herceptin, Glivec (STI571), and 
Iressa (ZD1839) [7,8,47,73-75]. These new agents are 
likely to find optimal activity in particular subgroups of 
patients. Identifying these subgroups will be a challenge, 
and microarrays will play an important role in this process. 
There is even the potential to move towards the vision of 
individualised, genome-based, combinatorial cancer ther- 
apy, targeted to the genetics of particular patients [47,73- 
75]. Microarray analysis will play a leading role alongside 
other new technologies in testing the ability to achieve this 
vision. 
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